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The distributions of the largest and the smallest eigenvalues of a p-variate sample covariance 
L_l . matrix S are of great importance in statistics. Focusing on the null case where nS follows the 

rf\ ' standard Wishart distribution W p (I,ri), we study the accuracy of their scaling limits under the 

setting: n/p — >■ 7 £ (0, 00) as n — > 00. The limits here are the orthogonal Tracy- Widom law and 

its reflection about the origin. 
C^ ' With carefully chosen rescaling constants, the approximation to the rescaled largest eigenvalue 

distribution by the limit attains accuracy of order 0(min(n,p) -2 ' 3 ). If 7 > 1, the same order of 

accuracy is obtained for the smallest eigenvalue after incorporating an additional log transform. 

Numerical results show that the relative error of approximation at conventional significance 

levels is reduced by over 50% in rectangular and over 75% in 'thin' data matrix settings, even 
•^ ' with min(n,p) as small as 2. 
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en . 

O ■ 1. Introduction 

CN, 

Understanding the behavior of the extreme eigenvalues of a sample covariance matrix S 
is important in a large number of multivariate statistical problems. As an example, 
consider one of the most common inference problems: testing the null hypothesis that 
the population covariance is identity. Roy's union intersection principle [29] suggests that 
we reject the null hypothesis for large values of the largest eigenvalue of S (or for small 
values of the smallest eigenvalue). Naturally, the next question is: How should the p- value 
be calculated? 

To address this issue, and many others, it is necessary to examine the null distributions 
of the extreme sample eigenvalues. In this paper, we restrict ourselves to the Gaussian 
framework. In particular, let X be an n x p data matrix whose row vectors are i.i.d. 
samples from the N p (0, 1) distribution. The pxp matrix A = X'X then follows a standard 
Wishart distribution: A^ W p (I,n), and is called a (real) white Wishart matrix. The 
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ordered eigenvalues of A are denoted by Ai > • • • > X p . Our interest lies in Ai and A p , as 
A = nS. 

The exact evaluation of the marginal distributions of these eigenvalues is difficult, 
even in the null case considered here. See, for example, Muirhcad [24], Section 9.7. An 
alternative approach is to approximate them by their asymptotic limits. For the problem 
we are concerned with, Anderson [2], Chapter 13, summarized the classical results under 
the conventional asymptotic regime: p holds fixed and n tends to infinity. 

However, for a wide range of modern data sets (microarray data, stock prices, weather 
forecasting, etc.), the number of features p is very large while the number of observa- 
tions n is much smaller than or just comparable to p. For these situations, the classi- 
cal asymptotics is not always appropriate and different asymptotic theories are needed. 
Borrowing tools from random matrix theory, especially those established by Tracy and 
Widom [32-34], Johnstone [15] showed that under the asymptotic regime 

p— > oo, n = n(p) — > oo and n/p — > 7 £ (0, 00), (1) 

the largest eigenvalue Ai in A has the weak limit 

^Af 1: (2) 

a p 

where the centering and scaling constants are defined as 

/ 1 l \ 1/3 

p p = (\Jn-l + y/p) , a p = (\/n -l + y/p){ + — ) . (3) 

Here F\ denotes the orthogonal- Tracy- Widom law [33] , the scaling limit of the largest 
eigenvalue in real Gaussian Wigner matrices. Slightly prior to [15], as a by-product of his 
analysis on the random growth model, Johansson [14] proved that the scaling limit for 
the largest eigenvalue in the complex white Wishart matrix is the unitary Tracy- Widom 
law i*2- Recently, El Karoui [9] extended the asymptotic regime (1) to include the cases 
where n/p — > or 00. For the smallest eigenvalue, when 7 > 1, Baker et al. [3] showed that 
the reflection of F2 about the origin is the scaling limit for complex Wishart matrices, 
and Paul [28] gave the Tracy- Widom limits in the case where n/p — > 00 for both complex 
and real Wishart matrices. 

Although this type of asymptotic result has emerged only recently in the statistics 
literature, it has already found its relevance to applications with modern data. For in- 
stance, based on (2), Patterson et al. [27] developed a formal procedure for testing the 
presence of population heterogeneity with SNP (single nucleotide polymorphism) data. 

From a statistical point of view, to inform the use of any asymptotic result in practice, 
we need to understand how closely the asymptotic limit approximates the finite sample 
distributions. In the motivating example, this dictates the accuracy of the nominal p- 
valuc. 

In this paper, we first establish a rate of convergence result for the Tracy- Widom 
approximation to the distribution of the rescaled largest eigenvalue, but with more care- 
fully chosen constants than (3). Set a A b = min(a, b) and m± = m± |. We show that 
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1/3 



fJ>n, P = (\/n^+ Vp~) 2 , Vn, P = ( \fnZ + y/pT) I — ^= + — — I (4) 



results in better approximation. The difference between the distribution of (Ai — 

fin,p) I &n,p and F\ reduces to the 'second order', being 0((nAp) -2 ' 3 ) rather than 
0((nAp) _1,/3 ), that would apply by using (3). See Theorem 1. Numerical work in Sec- 
tion 2.2.1 suggests that the improvement is substantial. 

Further assuming 7 > 1 in (1), we find that, with a log transform, the scaling limit 
of logAp is the reflected Tracy- Widom law G\ (defined by G\(s) = 1 — F\{—s)) [28]. 
Moreover, with appropriate rescaling constants, the accuracy of the limit also reaches 
the second order: 0(p~ 2 / 3 ). See Theorem 2 and Section 2.2.2. 

In the literature, El Karoui [10] established a parallel result for Johansson's theorem for 
the largest eigenvalue on the complex domain and Choup [6] studied the same problem 
via an Edgeworth expansion approach. Recently, Johnstone [16] obtained both scaling 
limit and convergence rates for the extreme eigenvalues of an F -matrix, on both complex 
and real domains. As is usual in the Random Matrix Theory literature, results on the real 
domain are founded in part on those for complex data but require significant additional 
constructs and arguments. This is explained for our setting in Sections 3 and 4. 

The rest of the paper is organized as follows. In Section 2, we present theorems for 
both the largest and the smallest eigenvalues, together with supporting numerical results, 
related statistical settings, a real data example and a brief discussion. Section 3 proves 
the theorem on the largest eigenvalue and Section 4 sketches the proof of the one on 
the smallest eigenvalue. Finally, Section 5 establishes necessary Laguerre polynomial 
asymptotics, which is first used without proof in Section 3. Technical details are collected 
in the Appendix. 

2. Main results and their applications 

In this section, we first state two main theorems of this paper, which are concerned 
with the convergence rates of the largest and the smallest eigenvalues in finite Wishart 
matrices to their Tracy- Widom limits. The theorems are then complemented and further 
justified by a series of numerical experiments, in which the Tracy- Widom approximation 
is reasonably good even when n and/or p are as small as 2. After that, we review several 
related statistical settings and consider a real data example. Finally, we end the section 
with a brief discussion. 

2.1. Main theorems 

We begin with the largest eigenvalue, for which we have the following rate of convergence 

result. 
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Theorem 1. Let A~W p (I,n) with n ^ p and Ai its largest eigenvalue. Define (fj, n ^ p ,a n ^ p ) 
as in (4). Under condition (1), for any given Sq, there exists an integer Nq(so,j), such 
that when n Ap > Nq(sq,j) and is even, for all s > So, 

|P{Ai < m«, p +Vn, P s} - Fi(*)| < C(s )(n Ap)- 2/3 exp(-s/2), 

where C(-) is continuous and non-increasing. 

We also obtain an analogous result for the smallest eigenvalue. Refine condition (1) to 
p^oo, p+ 1 < n = n(p) — > oo and n/p— tj€ (l,oo). (5) 

Define /U~ p = (y/nT— y/pT) 2 , cr^. p = (y/nZ — y/pZ){l / y/pZ — 1/ '^/nZ) 1 / 3 , and let 



r, 



""•p ..- _!„„('..- -\ i c^- \2 



Then we have the following theorem. 



J Vp= 1 °g(/Vp)+o(' 7 W>) ■ ( 6 ) 



Theorem 2. Lei .A ~ Wp(-T, n) wii/i n—l>p and X p as its smallest eigenvalue. De- 
fine (v~ p1 T~ ) as in (6). Under condition (5), we have 



\og\ p -v np v 



Gi 



wit/i Gi(s) = f — F\{— s), the reflected Tracy-Widom law. 

In addition, for any given Sq, there exists an integer Nq(sq,^/), such that when p> 
No(so,^) and is even, for all s > So, 

\P{\og\ p < v~ >p - r- p s} - Gi(-*)| < C(s )p~ 2/3 cxp(-s/2), 

where C(-) is continuous and non-increasing. 

While we only prove rigorous bounds for even p, numerical experiments show that the 
approximation works just as well in the odd case, and for the largest eigenvalue, also in 
the square case. See Tables 1 and 2. 

2.2. Numerical performance 

An important motivation for the current study is to promote practical use of the Tracy- 
Widom approximation. To this end, we conduct here a set of experiments to investigate 
its numerical quality. 

2.2.1. The largest eigenvalue 

Distributional approximation We first computed the empirical cumulative probabilities 
of Ai (after rescaling), at a collection of F\ percentiles, using R = 40000 replications. 
This is done for three different categories of (n,p) combinations: (1) the square case, 
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Table 1. Simulations for finite n x p vs. Tracy- Widom limit: the largest eigenvalue. For each 
(n,p) combination, we show in the first row empirical cumulative probabilities for Ai, rescaled 
by (4), and the second row, with parentheses, rescaled by (3), both computed from R = 40000 
repeated draws from W p (n,I) using the method in [7]. Conventional significance levels are high- 
lighted in bold font and the last row gives approximate standard errors based on binomial 
sampling. Fi was computed by the method in [8] with percentiles obtained via inverse interpo- 
lation 



Percentiles 


-3.8954 


-3.1804 


-2.7824 


-1.9104 


-1.2686 


-0.5923 


0.4501 


0.9793 


2.0234 


TW 


0.01 


0.05 


0.10 


0.30 


0.50 


0.70 


0.90 


0.95 


0.99 


2x2 


0.000 


0.000 


0.000 


0.034 


0.379 


0.690 


0.908 


0.953 


0.988 




(0.000) 


(0.000) 


(0.000) 


(0.015) 


(0.345) 


(0.669) 


(0.902) 


(0.950) 


(0.987) 


5x5 


0.000 


0.002 


0.021 


0.218 


0.465 


0.702 


0.908 


0.954 


0.989 




(0.000) 


(0.002) 


(0.020) 


(0.213) 


(0.460) 


(0.698) 


(0.907) 


(0.953) 


(0.989) 


25x25 


0.003 


0.031 


0.075 


0.280 


0.492 


0.700 


0.902 


0.951 


0.990 




(0.003) 


(0.030) 


(0.075) 


(0.280) 


(0.491) 


(0.699) 


(0.902) 


(0.951) 


(0.990) 


100 x 100 


0.007 


0.041 


0.091 


0.294 


0.501 


0.704 


0.902 


0.951 


0.990 




(0.007) 


(0.041) 


(0.091) 


(0.294) 


(0.501) 


(0.704) 


(0.902) 


(0.951) 


(0.990) 


8x2 


0.000 


0.001 


0.012 


0.196 


0.456 


0.702 


0.909 


0.955 


0.990 




(0.000) 


(0.004) 


(0.031) 


(0.270) 


(0.532) 


(0.754) 


(0.928) 


(0.964) 


(0.992) 


20 x 5 


0.001 


0.018 


0.054 


0.259 


0.483 


0.704 


0.906 


0.954 


0.990 




(0.002) 


(0.028) 


(0.073) 


(0.303) 


(0.531) 


(0.737) 


(0.921) 


(0.962) 


(0.992) 


100 x 25 


0.006 


0.040 


0.088 


0.292 


0.498 


0.701 


0.901 


0.950 


0.989 




(0.008) 


(0.047) 


(0.100) 


(0.314) 


(0.523) 


(0.721) 


(0.910) 


(0.955) 


(0.991) 


400 x 100 


0.009 


0.048 


0.096 


0.299 


0.502 


0.702 


0.902 


0.951 


0.990 




(0.010) 


(0.053) 


(0.104) 


(0.312) 


(0.516) 


(0.714) 


(0.908) 


(0.954) 


(0.991) 


500 x5 


0.010 


0.049 


0.098 


0.296 


0.502 


0.705 


0.906 


0.955 


0.990 




(0.020) 


(0.083) 


(0.150) 


(0.385) 


(0.589) 


(0.772) 


(0.933) 


(0.969) 


(0.994) 


1000 x 10 


0.010 


0.051 


0.101 


0.300 


0.504 


0.707 


0.902 


0.952 


0.991 




(0.017) 


(0.077) 


(0.138) 


(0.366) 


(0.571) 


(0.757) 


(0.923) 


(0.963) 


(0.994) 


5000 x 5 


0.012 


0.056 


0.107 


0.307 


0.509 


0.707 


0.905 


0.953 


0.992 




(0.027) 


(0.097) 


(0.169) 


(0.402) 


(0.602) 


(0.779) 


(0.933) 


(0.969) 


(0.994) 


10000 x 10 


0.012 


0.055 


0.108 


0.308 


0.504 


0.706 


0.905 


0.954 


0.991 




(0.021) 


(0.084) 


(0.150) 


(0.378) 


(0.580) 


(0.763) 


(0.929) 


(0.967) 


(0.994) 


2x SE 


0.001 


0.002 


0.003 


0.005 


0.005 


0.005 


0.003 


0.002 


0.001 



where n = p = 2, 5, 25 and 100; (2) the rectangular case, where p = 2, 5, 25 and 100 and 
n/p is fixed at 4:1; (3) the 'thin' case, where p = 5 and 10 but n/p could be as high as 
100:1 and 1000:1. In some sense, this category could also be thought of as in the situation 
where n/p— > oo as discussed in [9]. For comparison purpose, we rescaled Ai using both 
the new constants (4) and the old ones (3). The results are summarized in Table 1. 

Numerical accuracy with the new constants could be viewed from two aspects. First, 
for the conventional significance levels of 10%, 5% and 1% that correspond to right tails 
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Table 2. Simulations for finite n x p vs. Tracy-Widom limit: the smallest eigenvalue. For each 
(n,p) combination, empirical cumulative probabilities are computed for (log X p — v^. p )/t^ p using 
R = 40 000 draws from W p (I,n). Methods for sampling, computing F\ and obtaining percentiles 
are the same as in Table 1. Conventional significance levels are highlighted in bold font and the 
last line gives approximate standard errors based on binomial sampling 



Percentiles 


3.8954 


3.1804 


2.7824 


1.9104 


1.2686 


0.5923 


-0.4501 


-0.9793 


-2.0234 


RTW 


0.99 


0.95 


0.90 


0.70 


0.50 


0.30 


0.10 


0.05 


0.01 


4x2 


1.000 


1.000 


0.998 


0.893 


0.625 


0.326 


0.087 


0.041 


0.009 


10 x 5 


0.999 


0.995 


0.976 


0.798 


0.555 


0.310 


0.095 


0.047 


0.011 


50x25 


0.997 


0.973 


0.931 


0.728 


0.515 


0.302 


0.097 


0.048 


0.010 


200 x 100 


0.993 


0.960 


0.913 


0.713 


0.509 


0.306 


0.103 


0.050 


0.010 


8x2 


1.000 


0.992 


0.969 


0.792 


0.554 


0.314 


0.095 


0.046 


0.010 


20x5 


0.999 


0.977 


0.939 


0.740 


0.522 


0.301 


0.096 


0.047 


0.009 


100 x 25 


0.993 


0.960 


0.915 


0.713 


0.505 


0.298 


0.098 


0.048 


0.009 


400 x 100 


0.992 


0.954 


0.904 


0.701 


0.500 


0.298 


0.100 


0.049 


0.010 


2x SE 


0.001 


0.002 


0.003 


0.005 


0.005 


0.005 


0.003 


0.002 


0.001 



of the distributions, the approximation looks good even when p is as small as 2! In 
addition, it improves as p becomes larger and starts to match the finite distributions 
almost exactly when p is no greater than 25. See the last three columns of Table 1. 
Second, when p is large, for instance, in the 100 x 100 and 400 x 100 cases, F± provides 
reasonable approximation over the whole range of interest. 

As regards the comparison between different rescaling constants, neither choice seems 
superior to the other in the square cases (see the first block of Table 1). However, when 
the ratio n/p is changed to 4:1 or higher (see the second and the third blocks), the 
improvement by using new constants (4) is self-evident. 

As a remark, better performance on right tails and improvement by using the new con- 
stants, as reflected in this simulation study, agree well with the mathematical statement 
in Theorem 1. 

Approximate percentiles We can also use F\ to calculate approximate percentiles 
for the finite distributions, whose accuracy can be measured by the relative error 
r a =9a W l®a — 1- Here, 6 a is the exact lOOath percentile of the rescaled largest eigen- 
value in the finite n x p model and 0& W is its counterpart from F\ . 

In Figure 1, we plot r a for a = 0.95 and 0.99, with p ranging from 2 to 5 and n from 2 
to 50. Although n A p is no greater than 5, the approximation is reasonably satisfactory. 
For the 95th percentile, |r .95| ranges from 5% to 10% for most cases and slightly ex- 
ceeds 10% only when p = 2 and the n/p ratio is high. The approximation works even 
better for the 99th percentile, with |ro.gg| < 5% for most cases. Due to computational 
limitation [20], we could not obtain exact percentiles when n and p are large. We ex- 
pect the approximate percentiles to become more accurate as a consequence of better 
distributional approximation. 
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(a) 



(b) 



Figure 1. Plots of relative errors r a for approximate percentiles using F\: (a) 95th percentile; 
(b) 99th percentile. Exact finite n x p distributions are computed in MATLAB using Koev's 
implementation [20] and F\ is computed using the method in [8]. The percentiles are obtained 
from inverse interpolation. 

2.2.2. The smallest eigenvalue 

For the smallest eigenvalue, we perform a simulation study to investigate the distri- 
butional approximation. We chose two n/p ratios: 2:1 and 4:1, both with p = 2, 5, 25 
and 100. For each (n,p) combination, we used i? = 40000 replications. The simulation 
results shown in Table 2 demonstrate similar performance as in the case of the largest 
eigenvalue and agree well with Theorem 2. 



2.3. Related statistical settings 

Here, we review several settings in multivariate statistics to which our results are appli- 
cable. Throughout the subsection, we only use the largest eigenvalue to illustrate. 

Principal component analysis 

Suppose that X = [Xi, . . . ,X n ]' is a Gaussian data matrix. Write the sample covariance 
matrix S = (n — l)~ l X'HX, where H = I — n~ l ll' is the centering matrix and principal 
component analysis (PC A) looks for a sequence of standardized vectors oi, . . . , a p in R p , 
such that at successively solves the following optimization problem: 

maxja'Sa : a'aj =0,j<i}, 

where ao is the zero vector. Then, successive sample eigenvalues l\ > ■ • • > £ p satisfy 
ti = a', l Sa i . 

One basic question in PCA application is testing the hypothesis of isotropic variation, 
that is, the population covariance matrix S = t 2 I. For simplicity, assume that t 2 = 1 
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(otherwise we divide S by r 2 ). Then (n — 1)5 ~ W P (I, n—1). The largest eigenvalue i\ 
of 5* is a natural test statistic under the union intersection principle. Our result applies 
to (n — l)£i. If r 2 is unknown, we could estimate it by tr S/p. See [25]. 

Testing that a covariance matrix equals a specified matrix 

Suppose that X = [Xi,...,X n ]' has as its row vectors i.i.d. samples from the A^,(/U,S) 
distribution. We want to test the hypothesis Ho: E = Eo, where Eo is a specified positive 
definite matrix. 

Suppose /i is unknown, and let S = (n — l)~ 1 X'HX be the sample covariance matrix. 
The union intersection test uses the largest eigenvalue of Eg -1 5, denoted by £i(E ~ 1 S), as 
the test statistic [23], page 130. Observe that 4(Eq 1 S) = £i(E 1/2 SE~ 1/2 ). Under iJ , 
(n - 1)E 1/2 SE 1/2 - W p (I,n- 1). So, our result is available for (n - l^E^S 1 )- 

Singular value decomposition 

For X a real n x p matrix, there exist orthogonal matrices U(n x n) and V(p x p), such 
that 

X = UDV T , 

where D = diag(di, . . . ,d nAp ) <E M. nxp and d\ > • • • > d n /\p > 0. This representation is 
called the singular value decomposition of X [13], Theorem 7.3.5, with di the ith singular 
value of X. Theorem 1 then provides an accurate distributional approximation for d\ 
when the entries of X are independent standard normal random variables. 

2.4. The score data example 

We consider now the score data example extracted from [23] . The data set consists of the 
scores of 88 students on 5 subjects (mechanics, vectors, algebra, analysis and statistics). 
Taking account of centering, we have n = 87 and p = 5. 

One might expect that there are several common factors that determine the students' 
performance on the tests. Moreover, one might assume that the joint effects of the com- 
mon factors are observed in isotropic noises, in which case the covariance structure of 
the scores (after proper diagonalization) follows a spiked model £ = r 2 S m , where r 2 > 0, 
E m = diag(^i, . . . ,£ m , 1, . . . , 1) and < m < 4. (Note that the model £ = r 2 S4 is the sat- 
urated model and is indistinguishable from E = r 2 E5.) To determine m, we are led to 
test a nested sequence of hypotheses H^: S = r 2 S Tn with some m< k, for 0<k<3. 

To compute the p- value of testing Hk, we could (i) estimate t 2 by T 2 _ fc as the 
mean of the p — k smallest sample eigenvalues; (ii) construct the test statistic as 

T k = (n£k+i/<Jp_ k - ^r hP -k)/o- n ^-k] (ni) report Fi{T k ) as the approximate conserva- 
tive p- value. Step (iii) is justified as follows. Let £(Xj\n,p, E) denote the law of the jth 
largest sample eigenvalue of a W p (n, E) matrix. By the interlacing properties of the eigen- 
values [13], Theorem 7.3.9 (see also [15], Proposition 1.2), £(Ai|n,p— rn,/ p _ m ) could be 
used to compute the conservative p-value for the null distribution C(Xk+i\n,p, E rn ) for 
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Table 3. The test statistics Tk and the corresponding p- values Fi(Tk) cal- 
culated using new centering and scaling constants (4) and old constants (3) 
for the score data 





H 


Hi 


H 2 


H 3 


T k (new) 
p-value (new) 

T k (old) 
p-value (old) 


14.5934 
<10" 6 

14.4740 
<10 -6 


4.3162 

1.1 x 10 -4 

4.1155 
1.7 x 10" 4 


0.4535 
0.0996 

0.1803 
0.1376 


1.4949 
0.0235 

1.1897 
0.0371 



all k > to, which is further approximated by F\. We summarize the values of T k and the 
corresponding p- values in Table 3. 

From Table 3, we could see a noticeable difference between the values of T k and the 
corresponding p- values by using different rescaling constants. The p- values obtained from 
the new constants are typically smaller than those from the old constants. Noting that 
the p-values are already conservative, the new constants (4) prevent further unnecessary 
conscrvativcness that would otherwise be caused by the old constants in this example. 

2.5. Discussion 

We discuss below two issues related to our results. 

Log transform 

One notable difference between Theorems 1 and 2 is the logarithmic transformation of 
the smallest eigenvalue before scaling. 

Indeed, for the largest eigenvalue, a similar 0(iV~ 2 / 3 ) convergence rate can be obtained 
for the distribution of (logAi - v n ,p)/r n , v , with v n<p = log(/j„, p ) and r n , p = <7 n ,p/Hn, p . 
However, when n or p is small, its numerical results are not as good as those obtained from 
direct scaling. In comparison, for the smallest eigenvalue, the transform yields substantial 
numerical improvement. Therefore, we recommend the log transform for the smallest 
eigenvalue. 

As no theoretical analysis justifying the choice of the transform is currently available, 
we attempt some heuristics in the following. First, observe that sample covariance matri- 
ces are positive semidefinite. So, for A p , the hard lower bound at truncates the left tail 
of its density function on any linear scale, and hence obstructs the asymptotic approx- 
imation by G\ that is supported on the whole real line. However, by a map x i— > logs, 
we map the support to the whole real line and avoid the 'hard edge' effect. The largest 
eigenvalue does not necessarily benefit from this transform, for it is on the 'soft edge', 
that is, the right edge of the covariance matrix spectrum, which does not have a de- 
terministic upper bound. Such heuristics are supported by related studies on Gaussian 
Wigner matrices [17] and i^-matrices [16]. 
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Software 

There have been works on the numerical evaluation of the Tracy- Widom distribu- 
tions [4, 5, 8] and the exact finite nxp distributions of the extreme eigenvalues [19, 20]. In 
addition, the author and colleagues have developed an R package RMTstat [18] that is in- 
tended to provide an interface for using the Tracy- Widom approximation in multivariate 
statistical analysis. 

3. The largest eigenvalue 

This section is devoted to the proof of Theorem 1. We use the operator norm conver- 
gence framework developed in [35] , for the joint eigenvalue distribution of white Wishart 
matrices is essentially the same as the Laguerre orthogonal ensemble in random matrix 
theory (RMT). 

In the proof, we first give the determinantal representations for the finite and lim- 
iting distribution functions and work out explicit formulas for related kernels, in which 
Widom's formula (12) plays the central role. Then, a Lipschitz-type inequality shows that 
the difference in determinants is bounded by the difference in kernels. The representa- 
tion of the finite sample kernel involves weighted generalized Laguerre polynomials, while 
that of the limiting kernel uses Airy function. A decomposition of the kernel difference 
then enables us to transfer bounds on the convergence of Laguerre polynomials to Airy 
function to bounds on the kernel difference and eventually to bounds on the difference 
of the probabilities. 

3.1. Determinantal laws 

Following RMT notational convention, we replace the dimension parameter p of a white 
Wishart matrix A by N, and use Xi instead of Xi to denote its eigenvalues. Henceforth, 
we assume that N is even, n = n{N) > N + 1 and n/N -47^ [l,oo) as N — > 00. The 
cases 7 G (0, 1] are easily obtained by interchanging n and N. 

In the RMT literature, for an integer N > 2 and any a > — 1, the Laguerre orthogonal 
ensemble with parameters TV and a, denoted by LOE(7V, a), refer to joint eigenvalue 
density 



p N ( Xl ,...,x N ) = - — Yl ( x J- x ^Y[ xl J e Xj/2 > ( 7 ) 

N ' a l<j<k<N j=l 

where Xi > ■ ■ ■ > xn > 0. If further a is a non-negative integer, (7) matches the density 
function of ordered eigenvalues x\> ■ ■ ■ > xn > from a white Wishart matrix A ~ 
Wn(I,ti), with 

a = n-N-l. (8) 

Henceforth, we identify the LOE(iV, a) model with eigenvalues of A ~ Wn{I, n) by (8). 
Thinking of a and n as functions of N, in what follows we sometimes drop explicit 
dependence of certain quantities on them. 
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For LOE(iV, a), [34], Section 9, features the following determinantal formula 



F N ,!(x') = P{x x < x'} = ^dct(I - K NX ) 
Here x = ~^-x>x' and K jy is an operator with 2x2 matrix kernel 

K N (x,y) = (LS Ntl )(x,y) + K 6 (x,y), 
where 
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(9) 
(10) 



I -d 2 
e x T 





-e(x -y) 



In L, 82 is the differential operator with respect to the second argument, e± is the 
convolution operator acting on the first argument with the kernel e(x — y) = ^ sgn(a; — y) 
and TK(x, y) = K(y, x) for any kernel K. 

To give an explicit formula for Sn,i, introduce the generalized Lagucrre polynomials 
{^fclfco ([31], Chapter V), which are orthogonal on [0,oo) with weight function x a e~ x . 
The normalized and weighted versions of them become 



<f> k (x;a) = h- 1/2 x a / 2 e- x / 2 L%(x), 



k = 0,.. 



(11) 



with hk = J L%(x) 2 x a e x dx = (fc + a)\/k\. Widom [36] derived a formula for Sjv.i, 
which can be rewritten in a form more convenient to us [1], equation (4.3), as 



S N ,i(x, y) = S N , 2 (x, y) + N}— x <>/* e -*/* 



-rL%{x) 
ax 



(12) 



x / sgn(y-z)z a / 2 - 1 e-^ 2 [L%(z)-L%_ 1 (z)}dz, 



where Sn,2 is the unitary correlation kernel 



N-l 



SnM x iV) = 5Z <Ma:;a)0fc(y;a)- 



k=o 



Let qn = y/N(N + a), and define as in [10], Section 2, functions 



(x;a) = (-l) JV JM0 Ar ( a . ;o _i) a .-i/ai (C 



>o, 



(13) 



^x;a) = (-l) N - 1 J^^ N . 1 (x;a + l)x- 1 ^l x > Q . 



Write a o b for the operator with kernel (a o 6) (a;, y) = L a(x + z)b(y + z) dz. Then Sn,2 
has the integral representation [10, 15] 



SN,2{x,y) 



(x + z)ijj(y + z) + i'(x + z)<fi(y + z) dz — (<jx>ip + ip<xf>)(x,y). (14) 
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By [31], equations (5.1.13) and (5.1.14), the second term on the right-hand side of (12) 
equals 

AH f°° 

Hence, we obtain 

SN,i( x >y) = s NA x >y) + ' i l ; ( x )( £( t>)(y) ( 15 ) 

with Sn,2{ x ,v) given in (14). Together with (9) and (10), this gives the determinantal 
representation of the finite sample distribution on the original scale. 

The Tracy- Widom limit has a corresponding determinantal representation [35] 



Fi (*') = y/det(I-K GO Ef), (16) 

where / = l s > s < and the operator -Kgoe has the matrix kernel 

Fooe(M) = [§(;% S ^ s f) +K%s,t). 

Introduce the right tail integration operator e as in [16], where (ig)(s) = J g(u) du and 
for kernel K(s,t), (iiK)(s,t) = J K(u,t)du. Also write a®b for the rank one operator 
with kernel (a <E) b)(s, t) — a(s)b(t). Then the entries of -Kgoe are 

S(s, t) = (S A - \ Ai <g> eAi)(s, t) + |Ai(s), 
SD(s,t) = -d 2 (S A (s,t)-±Ai®sAi)(s,t), (17) 

IS(s,t) = -£i(S A - ^Ai® eAi)(s,t) - ±(eAi)(s) + |(eAi)(t). 

Here SA(s,t) = (Ai o Ai)(s,£) is the Airy kernel, and Ai(-) is the Airy function ([26], 
page 53, equation (8.01)). 

Let G = -4= Ai, and define matrix operators 

I -d 2 \ r ( I 0\ r ( 



L -[-e 1 T )> Ll -(-e 1 0J' L ' 
We can write Kgoe in a compact form as 

K GO e = L(S A ~G®eG) + L X {G ®^) + L 2 (^ ®G)+ K e . (18) 

3.2. Rescaling the finite sample kernel 

Under the current RMT notation, the rescaling constants (4) are translated to 

/ 1 1 \ 1/3 

M«.jv = (V™ 7 + \/N-) . <J n , N = (^Tn+^/NZ)[^ = + ^==) . (19) 

Vx/nT JN-J 



Introduce the linear transformation r(s) = fi n- N + s&n.N and let -Fjv.i(') — Fn.i( t (')) be 
the distribution function of t~ 1 (xi), that is, the largest eigenvalue of A ~ Wn{I,ti), 
rescaled by (19). 
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Define the rescaled kernel K T as 



K T { S ,t) = y/T 1 (s)T'(t)K N (T( S ),T(t))=(T ntN K N (T(s),T(t)). (20) 



Since Kn and K T share the spectrum, Fjv,i(s') = y/dct(I — K T f). 

To work out a representation for K T , apply the T-scaling to <j),ip and Sn,2 to define 

</>t(s) = 0Vi,JV</>(Mn,JV + S(T„,Ar), ^t(s) = (Tn,Nlp(thi,N + S<7 ra> jv) (21) 

and 

S T (s,t) =<7 n ^ N S N ^(Hn,N + S<T n>N , Vn,N + *Cn,Jv) = (0r O VV + VV * <Ar)( s > *)■ (22) 

Then we obtain from (15) that 



S*(«,t) = Vr'(s)r'(t)5 J v,i(r(s) ) T(t)) = S r (*,t) +^ T (*)(e^-)(t). (23) 

This, together with (10) and (20), leads to 

K T (s,t)=( ~ an '"' d2 )s?(s,t) + <r n , N K £ (s,t). 

\&n,N ■ £l J / 

Observe that det(I — -K,-/) remains unchanged if we divide the lower left entry by a n .N 
and multiply the upper right entry by a n ,N- Thus, we obtain 



F NA (s') = v/dct(/ - K~f) (24) 

with 

K T {s,t) = (LS?)(s,t) + K s {s,t). (25) 

To match the representation (18) of JCgoe, and to facilitate later arguments, it is 
helpful to rewrite LS^, and hence K T , using e. To this end, observe that / ip T — and let 

fa = \l 0r(s)d S . (26) 

* J-oo 

By the identity (eg)(s) = i J g — (ig)(s), we obtain e<f) T = Pm — £<f> T and eip T = —eip r , 
and so 

LS? = L(S T -xp T ® i<j> r ) + p N L{i(j T ® 1). 

o o s 



Now L = L + E with J5 = I V g ) . Since 2(ei + £i) equals integration over R in the 
first argument and f tp T = 0, we obtain 

LSf = Z(5 T - 1p T (g) £0 T ) + £S T + Avi(Vv ® 1) 

= Z(S T - -0r ® £0 T ) + /3jv£l(VV ® 1) + PnL 2 {1 <E> 1p T ). 
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The second equality holds, for {ES T ) 21 = \ f^° S T (u,t)dt = Pn J^° Vv(£ + z)dz = 
/3]\[(etjj T )(t). Finally, this gives K T a similar decomposition to that of i^GOE 

K T = L(S T - 4w ® e<t> T ) + Li(xjj T <g> fi N ) + L 2 (f3 N <g> ip T ) + K € . (27) 

3.3. Generalized Fredholm determinants 

For any fixed So £ K, we are interested in the convergence rate of -Fjv,i(s') to Fi(s') for 
all s' > sq. In what follows, we show that this relies on the operator convergence of K T 
to A'qoe- 

First, we note that the determinants in (9), (16) and (24) are not the usual Fredholm 
determinants (see, e.g., [21] for an introduction to the Fredholm determinant), as the e 
term on the lower-left position of the matrix kernels is not of trace class. Tracy and 
Widom [35] first observed the problem and proposed a solution by introducing weighted 
Hilbert spaces and regularized 2-determinants, which we adopt here. 

Consider the determinant in (9). Let p be a weight function such that (1) its reciprocal 
p- 1 G i 1 ^ 00 ); and ( 2 ) S N,i G i 2 ((^',oo);p)nL 2 ((x',oo);p- 1 ). Then e: L 2 {{x' ,oo);p) -> 
I/ 2 ((a;',oo);p _1 ) is Hilbert-Schmidt and Km can be regarded as a 2 x 2 matrix kernel 
on the space L 2 {{x' ,00); p) © L 2 ((x' ,00); p -1 ). In addition, by the second condition on p, 
the diagonal elements of Kn are trace class on L 2 ((x',oo);p) and i 2 ((a;',oo);p _1 ), re- 
spectively. 

For a Hilbert-Schmidt operator T with eigenvalues fXk, its regularized 2-determinant [12] 
is defined as det2(I — T) = Y\ k {l — /-*fc) c '' fc ■ If the diagonal elements of T are trace class, 
then we define the generalized Fredholm determinant for T as 

det(J - T) = dct 2 (I - T) exp(- trT). (28) 

As remarked in [35], the definition (28) is independent of the choice of p and allows the 
derivation in [34] that yields (9), (10) and eventually (15). 

Change the domain to (s', 00) with s' = t^ 1 (x') and the weight function to p — p or, 
and abbreviate L 2 ((s', 00); g) as L 2 (g) for any suitable g. Then, K T and -Kqoe are mem- 
bers of the operator class A of 2 x 2 Hilbert-Schmidt operator matrices on L 2 (p) © 
L 2 (p~ l ) with trace class diagonal entries. Definition (28) and previous derivations in 
Section 3.2 remain valid. 

In order to make the latter argument more explicit, it is convenient to make a specific 
choice of the weight function p. In particular, on the s-scale, we choose 

/0 (.s) = l + cxp(| S |). (29) 

This implies that on the a;-scale, we specify the weight function p — po (t^ 1 ) as 

p(x) = l+exp(|a;-/i„,Ar|/cr n) jv'). 

It is straightforward to verify that the required conditions are all satisfied. 
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With rigorous definition of the determinants, we now relate the convergence of -F/v.i 
to F± to that of K T to A'goe- First of all, simple manipulation leads to 

|Jfr,i(«') ~ F 1 {s')\ < |Fjv ' l(g J } ~ f 1 (S01 = -1^| det(J- K T ) - det(J - JT GO e)|. (30) 

To bound the difference between the determinants, we have the following Lipschitz-type 
inequality. Here and after, |j • || i and || ■ H2 denote the trace class norm and Hilbcrt-Schmidt 
norm, respectively. 

Proposition 1. Let A, Be A, and det(J - A), det(J - B) defined as in (28). If 
Ei=i P« - B «Hi + E i7 Lj Uij - B vh < 1/2, then 

I det(.T - A) - det(J - B)\ < M(B) ( J^ II Mi - B u ,||i + ]T ||A y - B tJ || 2 j , (31) 

where M(B) = 2| det(/ - B)| + 2cxp[2(l + ||B|| 2 ) 2 + £\ ||-B«||i]. 

Proof. [16], Proposition 3, established a similar bound to (31), but with M(B) replaced 

by 



C*(A,B) = |e |exp 



l(l + \\A\\ 2 + \\B\\ 2 f 



|det 2 (/-B)|- 



ItrA-trBI 



We now bound C(A,B) by the above claimed constant M(B). 

Observe that for \x\ < 1/2, |e x - 1| < 2\x\. Therefore, when £ 2 =1 \\Mi - B«||i + 
E i7 tj \\Mj - By ||a < 1/2, we have | tr A- trB| < £ 2 =1 \\Mi - ^Hi < 1/2, which in turn 
implies |e" trA -e" trB | < 2| tr A- trB||e" trS |. Hence, for the terms in C(A,B), we have 

|e- trA |< |e- trB -e- trA | + |e- trS |<|c- trS |(2|trA-trB| + l) 

< |e- trB |^2^ \\A U - B«\\i + l) < 2cxp(||Bi 1 ||i + ||B 22 ||i) 

and 

|det 2 (I-B)|l^^^l<2|det 2 (I-B)||e- trS |=2|det(I-B)|. 
Moreover, we observe that 

1 + P|| 2 + ||B|| 2 <1 + 2||B|| 2 + ||,4-B|| 2 

2 
< 1 + 2||B|| 2 + Y, II A « - Bu\\!+ Y, \\Mj - Bij || 2 

«=1 i=£j 

<2 + 2||B|| 2 . 
Plugging all these bounds into C(A, B), we obtain the claimed form of M(B). □ 
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Remark 1. Proposition 1 refines [16], Proposition 3, by having the leading con- 
stant M(B) of the bound depend only on B, which is important for deriving properties 
of the C(so) function later. 

3.4. Decomposition of K T — K GOE 

By Proposition 1, to prove Theorem 1 is essentially to control the entrywise convergence 
rate of K T to i^GOE- To this end, we construct a telescopic decomposition of K T — -FsTgoe 
into sums of simpler matrix kernels whose entries are more tractable. 

To explain the intuition behind the decomposition, we introduce constants f2 n ,N 
and a n ,N as 

/ 1 1 \ 1/3 

Vn.N = (V^+ + y/N+f , V n>N = (^+y/N^) I -— + —=) ■ (32) 

Iii [10], it was shown that (n n ,N,o~n,N) = (An— i,JV— i>^n— i,jv— l) is 'optimal' for ip T in 
the sense that \ip T - G\ = 0(iV- 2 / 3 )', but suboptimal for cj) T as \<f> T - G\ = 0(N-^ 3 ). 
However, later in Proposition 2, we will show that \<fi T — G — AnG'\ = 0(N~ 2 / 3 ) for 

A N = ^ = fin - 2 > N = OiN- 1 / 3 ). (33) 

0~n-2,N 

(For a proof, see Section A. 5.) These bounds suggest that, in the decomposition, we 
align ip T with G, and </> T with G + A^G' . 

Let G N = G + A N G' and S An = G o G N + G N o G. We obtain 

Sa n - G <g> sG N = S A -G®eG 
for 

G(s + z)G'(i + z) + G'(s + z)G(i + z)dz = / — [G(s + z)G(t + z)]dz = -G(s)G(t). 
o Jo dz 

This, together with (18) and (27), leads to the decomposition 

K r -K GO v = L(S T - S An ) + L(G®eG n -%Ij t ®e4> t ) 

(34) 
+ Lityr <g> P N ~ G® j. ) + L 2 (/3at ® W - ^ <8> G). 

3.5. Laguerre asymptotics and operator bounds 

Here we collect a set of intermediate results to be used repeatedly in the proof of Theo- 
rem 1. 

To start with, we consider the asymptotics of <f> T and ip T and their derivatives. Recalling 
that G = -4= Ai and Gm = G + AjyG' , we have the following. 

Proposition 2. Let <p T) ip T and Ajy be defined as in (21) and (33). Assume that (8) 
holds, and that as N — > oo, n = n(N) — > oo with n/N — > 7 <G [l,oo). Then, for any 
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given sq, there exists an integer Nq(so,"/) such that when N > Nq(sq,j), for all s > So, 

|^(«)|,K(«)|<C(«o)fflq>(-«), (35) 

|^(8)|, |^(*)|<C(* )fflq)(-a), (36) 

\M°) ~ G(s)\, W T {s) - G'(s)\ < C( So )N- 2 / 3 cxp(~ s ), (37) 

\M*)-Gn(*)\,\4>U*)-Gn(*)\ <C( So )N~ 2 / 3 cxp(~ s ), (38) 

where G(-) is continuous and non-increasing. 

Integrating these bounds over [s, oo), we know that they remain valid if we re- 
place tp T ,(f> T ,G and Gat with eip T ,S(j) T ,sG and sGn on the left-hand sides. The proof 
of Proposition 2 involves careful Liouville-Green analysis on the solution of certain dif- 
ferential equations and will be discussed in detail later in Section 5. 

On the other hand, for G and Gat, we have the following bounds from [26], page 394. 
Note that the bounds for Gat and G' N do not depend on N, for An is uniformly bounded. 

Lemma 1. Fix f3 > and k > 0. Then, for all s > so, 

\s k G(s)\, \s k G N (s)l \s k G'(s)\, \s k G' N (s)\ < G( So )exp(-/3 5 ), 
where C(so) is continuous and non-increasing. 

For a proof of the lemma, see [22]. Integrating the bounds for \G\ and |Gjv| over [s, oo), 
we obtain that \eG\ and leG^I are also bounded by C(so)e~" s . 

For a later operator convergence argument, we will need simple bounds for certain 
norms of operator D:L 2 (pi)^t L 2 {p2) with kernel D(u,v) = a(u)(3(v)(aob)(u,v), where 
{PI1P2} C {p, p~ 1 } with p given in (29). In particular, we have 

Lemma 2 ([16]). Let D:L 2 {pi) — >• L 2 {p- 2 ) have kernel D(u,v) = a(u)/3(v)(aob)(u,v). 
Suppose that {pi,P2J C {p, p~ 1 } and that, for u>s', 

|«H|<« e QlU , \P(u)\<(3 e PlU , \a(u)\<a e- a ^, \b(u)\ < b c~ b ' u , (39) 

with a± — ai,b\ — f5\ > 1. Then the Hilbert-Schmidt norm satisfies 

\\D\\ 2 < c ao ^l ba cxp[-( ai +b 1 -a 1 - P l )s' + |s'|], (40) 

ai + b\ 

where C = C(ai,ai,bi,j3i). If pi — P2, the trace norm ||-D||i satisfies the same bound. 

3.6. Operator convergence: Proof of Theorem 1 

Abbreviate the terms in the decomposition (34) as 



K T - A'gqe = S R + (5 F + 6[ + 6 



2 
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We work out below entrywise bounds for each of these 8 terms and then apply Proposi- 
tion 1 to complete the proof of Theorem 1. In what follows, we use the abbreviation!)^/, 
k = —1,0, 1 to denote if, f and /', respectively Moreover, the unspecified norm || • || de- 
notes the Hilbcrt-Schmidt norm || • H2 for off-diagonal entries and trace class norm || ■ ||i 
for diagonal ones. 

6 R term 

Recall that 5 R = L(S T — Sa n ) with S T = 4> T oip T + ijj T o T and Sa n = Gn 0G + G0 Gn- 
Regardless of the signs, we have the following unified expression for the entries of S R : 

{8 R )u = D w {4> T - G N ) o L» w Vr + D^G N o D®(i/> T - G) 

(41) 
+ L> (fc) (?/v - G) oDC'^ + D^Go D (t \(j> T - G N ), 

for i,j £ {1,2}, k G { — 1,0} and I € {0, 1}. By Proposition 2 and Lemma 1, we find that 
for any of the four terms in (41), condition (39) is satisfied with cto = /3o = 1 ; <*x = fix = 0, 
a\ = b\ = 1 and {ao,&o} = {C(so),C(so)-/V~ 2 / 3 }. So Lemma 2 implies 

\\{S R )iA <C( So )iV- 2 / 3 exp(-2 S ' + | S '|). (42) 

By a simple triangle inequality, we can choose C(sq) in the last display as the sum of 
products of continuous and non-increasing functions, which can be seen from the term 
(ao/3oaob )/(ai + &i) in (40). Moreover, the term C in (40) is a universal constant for 
fixed ai,ai,bi and j3\ here. Hence, the final C(sq) function remains continuous and 
non- increasing. 

Finite rank terms 

For a rank one operator a<£> b:L 2 (pi) — > L 2 (p2) with kernel a(s)b(i), its norm is 

||a®&|| = ||a||2 1 p 2 ||&|| 2l p r I - 

Here, the norm can be either trace class or Hilbert-Schmidt, since the two agree for rank 
one operators. In addition, for any g, \\a\\ 2 „ = / , \a(s)\ 2 g(s) ds. Now consider matrices 
of rank one operators on L 2 (p) ® L 2 (p~ l ) . Write || • || + and || ■ ||_ for || ■ ||2,p and || • | 2 . p -i , 
respectively [16], equation (213) gives the following bound 

'||0ll®6ll||l ||0l3®6l 2 ||2 > \ < /'||0ll|| + ||6ll||- ||ai2|| + ||6l2||+\ a -s 

|oai®&2i||a ||o 22 «)b22||i > /-U|a2i||-||b2i||- ||a 22 ||_||6 22 || +> / ' { ^> 

First consider Sq . We reorganize it as 

6q = -L(ip T ®i<f) T -G® iG N ) 

= -L[ip T ® i((f> T - G N ) + (V» T -G)® iG N ] = Sq A + 5q' 2 . 

The entries of 6 ,l , i — 1, 2, are all of the form a®b, with a and b chosen from D^ijj T , 
D( k \4> T -G N ), D^(tp T -G) and D^G N , for ke {-1,0,1}. 
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Observe that for r] > 2 we have 

/ exp(-r]s)p ±1 (s)ds< -exp(-r]s'±\s'\) < -exp(-ris' + \s'\). (44) 

Js' V - 1 ?7 

Together with Proposition 2 and Lemma 1, this implies 

P (fe) Vv||±,P (fc) G N || 2 ± <C( So )exp(-2 s '+| s '|), 
\\DW(i/> T - G)|| 2 ± , p (fe) (0 T - G w )|| 2 ± < C( So )^- 4/3 exp(-2 s ' + \s'\). 
These bounds, together with the triangle inequality and (43), yield 

||(<5<f)iilli < Ur <8> e(4> T - G N )\\x + ||(Vv - G) ®eG w ||i 
< |^r|| + ||e(0r - C?jv)||_ + ||V^r - Gj| + ||eG w |j_ 
<C(s )A r " 2/3 cxp(-2s' + |s'|). 

Similarly, we obtain the bounds for the other entries. In summary, we have 

||(<5 F MI <C(.s )iV- 2 / 3 cxp(-2 s ' + | s '|). (45) 

Switch to 5f and 6% . Recall that 5[ = L\ (ip T 0n — G® -j=) and <$f = Li (/?jy <8> Vv — 
-4= (g) G) . Due to their similarity, we take 6f as an example and the same analysis applies 
to 62 with obvious modification. We further decompose <5fas 

S( = U [ty T - G) <g> p N + G (fl N - ^)]. 

By (43), the essential elements we need to bound are \\D^(tp T — G)\\±, \\D^G\\± and 
||1||_ for k = — 1 and 0. The bounds related to D^(tp T — G) have already been obtained. 
For the other two terms, (44) and Lemma 1 give 

p«G|| 2 ± <C( So )exp(-2,s'+| S '|) 

and 

11111-=/ [I + cxpdsl)}- 1 ds < cxp(-|s|)ds<2. 

J s' J— OO 

Since Pn — jk = 0(-/V _1 ) (for a proof, see Section A. 5), we have 

\\(S[ )ii||i < IIWv - G) «)/?at||i + ||G® 08jv - 1/V2)||a 

< j|Wv - GOII+IIjSjvII- + IIGH + H/3AT - l/\/2||_ 

< G(s )iV- 2 / 3 exp(-s' + |s'|/2) + G(s )iV _1 exp(-s' + |s'|/ 2 ) 

<G(s )iV- 2 / 3 exp(-s72). 
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In a similar vein, the same bound can be obtained for ||(<5f)i2||2 and entries of <5.f . 
Therefore, we conclude that 

\\(S[ hi \\(Sih\\ <C( So )iV- 2/3 exp(- S 72). (46) 

Now we prove Theorem 1. 

Proof of Theorem 1. By the decomposition (34) and bounds (42), (45) and (46), the 
triangle inequality gives the following bound for the norm of each entry in K T — A'qoe- 

|| (A' r - A GOE ) y || < C(s )A^ 2/3 exp(- S 72). 

We then apply Proposition 1 with A = K T and B = Agoe to get 

| det(J - A T ) - det(I - A GOE )| < M(A GOE )C(s )A^ 2/3 exp(- s '/2), (47) 

where M(A GOE ) = 2det(/- A GOE ) + 2cxp{2(l + || A GOE || 2 ) 2 + £, ||#goe,«||i}- 

For the first term in M(A G o E ), we have det(7 — A GOE ) = Ff(s') < 1. On the other 
hand, we have 

||A'GO E ||2<^||(A G o E ) y || 2 <^||(A G o E ) M ||l+E!l( A GO E ) l ,||2. 
i,3 i i¥"j 

In principle, one can show that, for each (i,j), ||(A G o E )ij|| < C(so), with C(sq) contin- 
uous and non-increasing. Take ||(A G o E )n||i as an example. Let H T and G T be Hilbert- 
Schmidt operators with kernels (f> T (x + y) and ip T (x + y), respectively, then as an operator 

(A GOE )n = H T G T + G T H T + G<E>^-G<E>eG. 

Since ||AB||i < ||A|| 2 ||-B||2, we have 

||(A GOE )ii||i < 2||if T || 2 ||G T || 2 + ^||G|| aiP ||l||a^-i + WGhJeGW^-L 

Each norm on the right-hand side of the last inequality is the square root of an integral of 
a positive function on (s',oo) or (s',oo) 2 that is bounded by the corresponding integral 
over (so,oo) or (so,oo) 2 , which in turn is continuous and non-increasing in sq. Hence, 
||(A G o E )n||i < C(s ). A similar argument applies to other entries. So, we can control 
A/(A" G o E ) by a continuous and non-increasing C(sq). Finally, we complete the proof by 
noting (30) and the fact that l/.Fi(so) is continuous and non-increasing. □ 

4. The smallest eigenvalue 

This section is dedicated to the proof of Theorem 2. 

Recall that two key components in the proof of Theorem 1 were: (1) dctcrminantal 
representations for both the finite and the limiting distributions; (2) a closed-form formula 
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for the finite sample kernel that yields a convenient decomposition of its difference from 
the limiting kernel. 

In what follows, we first establish the rate of convergence for matrices with even di- 
mensions. This is achieved by working out the above two components in the case of the 
smallest eigenvalue. Then, we prove weak convergence for matrices with odd dimensions 
using an interlacing property of the singular values. 

4.1. Determinantal formula 

As before, we follow RMT notation to replace p with N, and identify LOE(iV, a) with 
eigenvalues of A ~ Wn(I,ti) by (8). 

Assume that N is even. For the smallest eigenvalue xn, for any x' > 0, [34] gives 



1 - F N , N (x') = P{x N > x'} = y/det(I-K NX ), (48) 

where x = lo<x<x' and Kn is given in (10). 

Due to a nonlinear transformation to be introduced, the formula (12) that we previ- 
ously used to represent Sn,i, the key component in Kn, is not most appropriate here. 
Instead, we find an alternative (yet equivalent) formula given in [1], Proposition 4.2, more 
convenient. Indeed, let 

fa(x\a) = (-l)\/^ fc (z;«)*- 1/2 l*>o, (49) 



with ajv = y/N{N + a). Then [1], Proposition 4.2, asserts that 



V /TV- 1 - 

Sn,i{x, y; a) = J -Sjv-1,2 (x, y; a + 1) + W — ——(j> N -i(x; a + l)(e(p N ^ 2 ){y, a + 1). (50) 

We write out the explicit dependence of these kernels on the parameter a as they are dif- 
ferent on the two sides of the equation. As a comparison, the previous representation (15) 
could be rewritten as 

S N ,i(x,y;a) = S N , 2 (x,y;a) + 4> N -i(x;a + l)(e<j) N )(y;a - 1). 

Its equivalence to (50) is given in the Appendix of [1]. 
Now, introduce the nonlinear transformation 

tt(s) = exp(i/- >iv - st~ n ), (51) 

where v~ N and t~~ N are the rescaling constants in (6), with p replaced by N. Incorpo- 
rating the transformation into -Kjv, we define 



K„(s,t) = y/Tr'(sW(t)K N (n(s),n(t))- (5 2 ) 

Let F/v,jv be the distribution of (logXN — v~ N )/i~~ N . Fix so, for any s' = tt^ 1 (x / ) > 
s and / = 1 S > S ', since det(7 — K^X) = det(J — R^f), we obtain 1 — Fn,n{—s') = 
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y/det(I — Knf). Thinking of A^ as a Hilbert-Schmidt operator with trace class diagonal 
entries on L 2 ([s' ,00); p) L 2 ([s',oo); / o~ 1 ) for proper weight function p, we can drop /. 
Now consider the representation of K^. For 6jy = \J{N — 1)/AT, let 

4>tv(s) = -\/bNTr'(s)(j)N-2(ir(s);a + l), ip-^(s) = ^/b^n' (s)4> N ^ 1 (Tr(s);a + I). (53) 
Using [11], Proposition 5.4.2, we obtain 

S N - 1 ,2(ir(s),ir(t);a + l) = (ir'{ S y(t)y 1 / 2 (4>i V o^+i>nO0 n )(s,t). 

On the other hand, simple manipulation yields that the second term in (50), with x = n(s) 
and y = n(t), equals (- 7 r'(s)) _1 ^ 7r (s)(e<?i 7r )(i). Thus, S N ,i(ir(s),ir(t)) = (-7r'(s)) _1 5^(s,i) 
with 

Sf (s, i) = (^ o Vv + ^ o <j>„)(s, £) + (^ (g> £0 7r )(s, t). (54) 

In addition, we have 

{-d 2 b N s)(ir(s),ir(t)) = — — = • [-o 2 t) v (s,t)\, 

OtTT(t) T \s)-K' \t) 



(eiS NA )(Tr(s),ir(t)) = / e(7r(s) - z)Sn,i(z, n(t)) Az 

e(s — u)SN y i(ir(u) , n(t))Tr' (u) du = —(eiS n )(s,t). 



Supplying these equations to (10), we obtain that 



with U(s) = diag(l/y / — tt'(s), —^/—tt'(s)). Observe that det(I — K^) remains unchanged 
if we premultiply A„. with U~ 1 (sq) and postmultiply it with U(sq). Denoting the resulting 
kernel by K w , we obtain that 

Ki T (s,t) = Q N (s)(LS? + K £ )(s,t)Qx 1 {t) (55) 



w ith Q N (s) = U ~ 1 (sq)U(s) = diag(V V (*o)/ V '(*), aA'(s)Mso)) and that l-F^-s') = 

v/ditp - A w ). 

Recallthat Gi(-s') = 1-^1(5')- So, F A r ! Ar(-.s')-Gi(-.s')=i ? i(-s')-[l-^,w(-s')]- 
Similar to (30), we obtain 

|-FV,jv(-s') - Gi(-*0l < tt^tI dct(/ - A',) - det(7 - A GOE )|. 

Thus, as in the case of the largest eigenvalue, by Proposition 1, to prove Theorem 2 is to 
control the cntrywise norm of K^ — Aqoe- For this purpose, a convenient decomposition 
of A'tt — A'qoe is crucial, to which we now turn. 
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4.2. Kernel difference decomposition 

We derive below a decomposition of K^ — A'goe- Despite the differences in actual for- 
mulas, the general guideline of the decomposition is the same as that in Section 3.4. 

To start with, we rewrite (55) using the right tail integration operator e. To this end, 
observe that J fa = and that 



-L + o ( ^). 



- _i /■■» _ (iv-i) 1 / 4 (n-i) 1 / 4 r((iv + i)/2) r r(n-i) i 1/2 

lN 2j_J hcK) 2(«-^)/ 2 (iv-i) r(n/2) Lr(iv-i). 

By the same argument that leads to (27), we obtain 

K«(a,t) = Qn(s)(K* + K* A + K F 2 + K^is^Q-^t), 
with the unspecified components given by 

K R = L(S v -fa®efa), 1^ 1 =Li(^®j8j V ), K F 2 = L 2 N ® fa). 

Define A^ = (y~ N - vfai.N-i)l T n-i,N-i = 0{N~ l l s ) and G N = G + A N G'. For 
Sa n = G o Gat + Gn o G, we have Sa^- — G ® eGat = 5a — G (g) eG. Abbreviate the 
terms in (18) as 

A GO e = K R + K[ + K£ + K £ . 

Then, 

k r _ r r = i^ -S A -fa®efa + G® eG) 

= L{S n -S An )~ Ufa ®efa-G<E> eG N ) = 6 RJ + 6 F . 
Further define 

S R ' D (s,t) = Q N {s)K R ( s ,t)Q N \t) - K R (s,t), 
6 F (s,t) = Q N (s)K F ti ( S ,t)Q]^(t) - Kf(s,t), i = 1,2, 
6%s,t) = Q N ( S )K s (s,t)Q^{t) - K e (s,t). 

Our final decomposition of K w — Aqoe is 

K v - A GOE = 5 R ' D + S RJ + 5 F + S[ + 5 F + S E . (56) 

We remark that Proposition 2 remains valid if we replace fa and fa with fa and fa, 
respectively. The proof is similar to that to be presented in Section 5 for Proposition 2. 
With these estimates, for each term in (56), we apply Lemma 2 to bound their entrywise 
norms as in Section 3.6. This completes the proof of the rate of convergence part in 
Theorem 2. 
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4.3. Weak convergence in the odd N case 

We now establish weak convergence to the reflected Tracy-Widom law in the odd N 
case. This is achieved by employing an interlacing property of the singular values. The 
strategy follows from [30], Remark 5. 

Assume that N is odd and n — 1 > N. Let X^ + i be an (n+ 1) X (N + 1) matrix with 
i.i.d. A(0, 1) entries and Xm the n x A matrix obtained by deleting the last row and 
the last column of Xjy+i- Denote the smallest singular values of Ajy+i and Xn by ln+i 
and tjv, respectively. We apply [13], Theorem 7.3.9, twice to obtain that tjv < tjv+i- 
Repeat the deletion operation on Xn to obtain the (n — 1) X (A — 1) matrix X/v-i 
and denote its smallest singular value by tjv-i- Then we obtain the 'sandwich' relation: 

(-JV-1 < Ljy < tjv+l- 

Observe that for k = N — 1,N and iV+ 1, X' k Xk arc white Wishart matrices with the 
smallest eigenvalues x^ = t| . In addition, as A — > oo and n/A — > 7 > 1 , 

K* " ^-i,;v-i)/V-i,iV-i = OiN- 1 / 3 ) and t~ n /t^ 1n ^ = 1 + O^- 1 ). 

They together imply that the weak limits for the odd N and the even N sequences must 
be the same. This completes the proof of Theorem 2. 

5. Laguerre polynomial asymptotics 

In this section, we complete the proof of Proposition 2. The proof has the following 
components. First, we take the Liouvillc-Grcen approach to analyze an intermediate 
function that is connected to both <j) T and ip T . After recollecting some previous results in 
[10, 15] for -0T-, we give a detailed analysis of tjj' T , ip' T — G' and also strengthen a previous 
bound on ip T — G. Finally, we transfer the bounds on quantities related to tp T to those 
related to <p T by a change of variable argument. 

5.1. Liouville— Green approach 

Recall (fln.N, Gu.n) in (32) and a in (8). We introduce the intermediate function 

F n , N {x) = {-l) N a-^ 2 ^mJri.x a ' 2+1 e- x / 2 L a N +1 {x) (57) 

as in [15], equation (5.1), and [10], Section 2.2.2. (Note: a = a^ — 1 for the constant o.n 
used in [15] and [10].) Then <j) T is related to F n ^ as 

1 f N^jn-iy^X^a^ 



l/vOO = —7=[ ; '■ — Fn-l.iV-lCMn.JV + SCTy^jv), 

V2 \ lbi~l,N-l / \fln,N + SCT n ,N 

Replacing the subscripts (n— 1, A— 1) by (n — 2, A) in /i n _i i jv-i, &n-i,N-i and i^n-i.N-i 
on the right-hand side, we also obtain the expression for <j) T (s). 
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Due to the close connection of ip T and <f> T to F n ^, the key element in the proof of 
Proposition 2 becomes asymptotic analysis of F n ^ and its derivative. To this end, the 
Liouville-Green (LG) theory set out in Olvcr [26], Chapter 11, is useful, for it comes 
with ready-made bounds on the difference between F n ^ and the Airy function, and also 
on the difference between their derivatives. 

To start with, we observe that F n n satisfies a second-order differential equation, 



<£-v 



\ 2 



1/4 



F n ,N(x), 



(58) 



with kn = h(n + N + 1) and Xn = \{n ~ N), By rescaling x — kn£,-, setting u>at(£) 
F n ,N{x), the equation becomes 

«&(0 = {«w/(0+$(OWO, 

where 

f(0 



(e-e-)(e-e+) 



4£ 2 



9(0 = 772- 



1 



The zeros of / are given by £± = 2 ± y/4 — u 2 N for u^ = 2\n/kn. They are called the 
turning points of the differential equation, for each separates an interval in which the 
solutions are oscillating from one in which they are of exponential type. The LG approach 
introduces a new independent variable, (,, and dependent variable, W, as 



<|) -'«>. 



-I 



1/2 



WAT. 



Then the differential equation takes the form W"(£) = {k%( + v(ljn, ()}W((). Without 
the perturbation term v(u>n,C), this is the Airy equation having linearly independent 
solutions in terms of Airy functions Ai(K^ £) and Bi(K^ £). We focus on approximating 
the recessive solution Ai(n^ £). 

Let f = f/C [26], Theorem 11.3.1, gives that 

^(0«/- 1/4 (0{Ai(«^ 3 C)+e2(«JV,0}, 

where, uniformly for £ <E [2,oo), the error term £2 satisfies 

h(^,0l<(M/£)(4 /3 C) 

i^ 2 (k*,oi < 4 /3 / 1/2 (0(^)(4 /3 o 



<; — F(uj n ) 



exp< — -F (un) 



-1 



(59) 
(60) 



In the bounds, M,£ are the modulus and weight functions for the Airy function and Af 
the phase function for its derivative ([26], pages 394-396). On the real line, £ > 1 and is 
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increasing, < M. < 1 and 7V> 0. Moreover, for all x, 

|Ai(x)| < (M/£)(x), \M'(x)\ < {M/£){x). (61) 

As x — > oo , their asymptotics are given by 

E(x)~V2eW x3/2 , M{x)^-k- 1 I 2 x- 1 I\ M{x) ~ n'^x 1 ^. (62) 

In addition, in the bounds (59) and (60), Ao = 1.04 and the analysis in [10], A. 3, shows 
that, uniformly for £ G [2, oo), for large enough N. 

CX p(^( Ww )\_l<Ar-2/3. (63) 

[k n J 

Come back to F nt N- The alignment in [10], equation (5) and A.l, shows that 

F n , N (x) =^4 /6 ^r 1/4 (C){Ai(4 /3 +e 2 (n N ,0}, 

with r N = 1 + 0(N~ 1 ). Let R N (£) = (C'(0/&) _1/a with (> N = ('(£+). As (C^ 1 = 
Kjv 5n,iV and /(C) = C'(£) 2 j wc can rewrite i^jv as 

F n M(x)=r N R N (0{Ai(t4{ 3 0+E2(KN^)}- (64) 

This representation serves as the starting point for all the subsequent asymptotic analysis 
on <J) T , "0 T and their derivatives. 

From now on, without notice, all the inequalities are understood to hold uniformly for 

^>A (so,7)- 

5.2. Summary of previous analysis: Bound for \if) T (s)\ 

Here, we summarize the previous analysis of F TCj jv in [10, 15], which gives the desired 
bound for |Vv(s)| in (35) and a crude estimate for \tp T — G\. 
Let Xu,n(s) = p. n .N + sa rh N and define 

0n,N{Xn,N(s)) = F n . N (x n . N (sj) I ^— - I . (65) 

\Xn,N(S)J 

As cr~% 2 N l / & < 1, we obtain that, for all s > 0, 

|F n , N (x„^(s))| < |F ra>JV (a ;n , J v(s))a^iV- 1 /6| < Cexp(-s), 

where the latter inequality was obtained in [15], A. 8. If sq < 0, then £ = x n ^{s)/K^ > 2 
uniformly for all s > sq. In addition, Lemma 3 later shows that |-Rtv(£)I < 1 + CN~ 2 / 3 \s\ 
for s e [so,0]. Therefore, we apply (59), (63) and (64) to obtain that 

\F hN (x ntN (s))\ < 2r N \R N (C)\(M/£)(K 2 l 3 C) < 4, 
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uniformly for s <G [sq,0]. Hence, \F ni N(x ni jy(s))\ < Cexp(— s) for all s > sq. Moreover, we 
note that cr n ,N / fin.N = 0(N~ 2 / 3 ). So, when N > N (s ), for all s > s , 

frn,N/%n,N($) < (1 + S <T n ,N I ' fin,N)~ < 2. 

Hence, uniformly for s > Sq, 

\0n,N(Xn,N( s ))\ < C(s ) CXp(-s) . (66) 

Finally, for any qn = 1 + 0(N^ 1 ), El Karoui [10], Section 3.2, showed that, for all s > sq, 
iQNOnAxnA 3 )) - Ai («)l < C(s )iV- 2 / 3 exp(- S /2). 

For ip T (s), observe that (/x n< N, a n ,N) — (/Un-i,JV-i)^'n-i,JV-i)- Using Sterling's formula, 
we obtain that ip T (s) — -jzPN&n-i,N-i(%n-i,N-i(s)) for some pn = 1 + 0(A^ _1 ). Then, 
we apply the last two displays to obtain 

K(s)|<C(so)exp(- S ) ! |^ T ( S )-G( S )|<C( So )^ 2/3 cxp(- s /2), (67) 

uniformly for s > Sq ■ 

Here, the first inequality gives the bound for \ip T \, while the bound on \i/j t (s) ~ G(s)\ 
could be further improved; see (75). Note that we cannot apply these results directly 
to (p T since the 'optimal' rescaling constants (p n -2.N , <5y,-2,jv) for F n ^ 2 ,N do not agree 
with the global constants {p n ,N,o'n,N)- 

5.3. Asymptotics of |^(s)|, \if>' T (s) - G'(s)\ and \i/> T (s) - G(s)\ 

Here, we derive bounds on \tp' T \ and \tp' T — G'\ and refine the bound on \i/j t (s) — G(s)\. 

5.3.1. Bound for \f T {s)\ 

To obtain bounds for \ij>' r \, we study \d s 9 n ^(x n _N(s))\. By the triangle inequality, 



\d s 9 n ,N(x n ,N(s))\ < 



Tpl I ( W Pn,N 

0~n.N-f n N\ x n.N{S)) j~T 

Xn,N{S) 



Pn,N 



cfn,NFn,N{%n,N{$)) ; 

X n.N\ s ) 



(68) 
Tn,i{s) +T Ni2 (s). 



In what follows, we deal with the two terms in order. 

The Tjv,i term Recall that £t n ,N/%n,N{s) < 2 for large N. So, we focus on a n ^F^ N , 
which can be decomposed as a n ^F^ N = J2i=i D l n n> with 

Dl,N = rNa n , N K^R' N (0{Ai(,4 3 0+^(K N ,0}, Aliv = rjv[V(0 - l]Ai'(4 /3 C), 
D nN = r JvAi'(4 3 C), D^ N = r N a- n:N K~ 1 R N (£,)d i e 2 {KN,0- 



28 Z. Ma 

Due to different strategies used for the asymptotics on the s-scale, we divide [so,oo) into 
Ii t N U h,N, w hh Ii^n = {sojSiN 1 / 6 ) and I 2 ,n = [siA 1 / 6 ,^. The choice of Si is worked 
out in Section A. 6. Here, we note that s\ > 1 and that, for s > Si, 

£ _1 (k^ 3 C) < Cexp(-3s/2) < Cexp(-s). (69) 

In addition, we will repeatedly use the following facts. 
Lemma 3. Under the conditions of Proposition 2, when N > Nq(sq,j), for all s € Ii,n> 

\R' N (0\ < C 7 - 1/2 (i +7), \Rn(0 - 1| < CiV- 2 / 3 | s |, 

|>4 /3 C - s| < (CN'^s 2 ) A ||s| A 1. 

Proof of Lemma 3 is given in [22] . 

Case s € Ii,n Consider D\ N first. Recall that r^r = 1 + 0(A^ _1 ). Together with 
Lemma 3, this implies 

|r N a„, ^^4(01 < CN~ 2 ' 3 . (70) 

On the other hand, as < Ai < 1, (59), (61) and (63) together imply 

|Ai(4 /3 C) +e 2 (KN,0\ < C(M/£)(k 2 ^C) < C£-\k%*Q. 

For s > 0, Lemma 3 implies k^ £ > s/2. Since £ is monotone increasing, by (62), 

|Ai(4 /3 C) +ea(«*,OI < C^ _1 (s/2) < Ck^/OVS))." 3 < Cexp(-s). 

If so < 0, we can replace the C on the rightmost side with C(sq) = max{C, 
max se [ 3so /2,o] ^~ 1 ( s )}i which is continuous and non-increasing in sq. Together with (70), 
we obtain that 

\Dl N \<C(s )N- 2 / 3 cMs). 

(Here and after, we derive more stringent bounds with the N~ 2 ' 3 term whenever 
possible. Although they arc not necessary for bounding \tp' T \, they are useful in the later 
study of \^ T (s)-G'(s)\.) 

For D 2 n N , we first have \tnR n (£) — 1| < nvl-Rjy 1 ^) — 1| + \ r N — 1|- Lemma 3 implies 
that [R^IO - 1| < CA- 2 / 3 |s|. Observing that |rjv - 1| = 0(A^" 1 ), we obtain 

\r N R N \0-l\<CN- 2 / 3 \s\. 

For |Ai'(/Cjy C)|, when s > 0, Lemma 3 gives k^ £ G [s/2,3s/2]. This, together with 
Lemma 1, implies that 

|Ai'(4 /3 C)|<Cexp(-3 S /2). (71) 
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If so < 0, we can replace the C on the right-hand side with C(sq) = max{C, 
maxr 3so / 2j o] |Ai (s)|}, which is continuous and non-increasing. Then the last two displays 
give 

\Dl N \ < C(s Q )N- 2 / 3 \s\ exp(-3s/2) < C(s Q )N-^ 3 exp(-s). 
For D^ N , we recall that rjy = 1 + 0(A^ _1 ). Together with (71), this implies that 

|^, N |<C(.s )cxp(- S ). 

For D^ N , since r N = 1 + O^- 1 ), £'(£) = f 1/2 (0 and C N = ^ 3 /a n , N , (60) and (63) 
imply 

\ D $,,N\ = \rN^n^NKj l 1 R N (O d ^2(KN,0\ 

<CN- 2 / 3 a ritN K- 1/3 R N (0(Af/£)( K T0 

= CN~ 2 ' 3 R~ N 1 (S,)(N/£){4\). 

Lemma 3 implies that R~^(£,) < C and kJj C, £ [s/2,3s/2], uniformly on -Zi^jv- So, (62) 
gives 

(Af/£)(4 3 C) < c s 1 /4 e -(i/(3v^)), 3 / 2 < Cexp (- S ) 

for all s > 0. And if sq < 0, we can replace the C on the rightmost side with C(sq) = 
max{C, max se [ 3so /2.o](A'/f )(s)}, which is continuous and non-increasing in sq. All these 
elements together lead to 

\D^ N \<C(so)N- 2 / 3 cxp(-s). 

Combining all the bounds on the D l n N terms, we obtain that TV i < C(so) ex p(~ s ) 
on I liN . 

Case s e h,N In this case, we define D^ N = D^ N and D 2 n N = D 2 nN + D 3 N + D^ N . 
Consider D^ N first. By (59), (61) and (63), we obtain that for N > N {s ,j), 

\Dl N \ < ca^N^lR^/RNKORNimM/E^Tc). 

Observe that, uniformly on I2,n, 

Vn,NKN l \R' N /RNm<C, R N {OM(kJ 3 0<C S . (72) 

For a proof of (72), see [22]. On the other hand, (69) holds on I2,n- Thus, 

|A\,jvI < C.scxp(-3.s/2) < Cs 4 cxp(-s) < CW- 2/3 cxp(-,s). 

For D 2 lN , we can write it as D 2 lN = r N R N (£)[Ai\K 2 { 3 ()Rx 2 (£) + a n , N K~ N 1 d/:£2(K N ,£)]. 
By (60), (61) and (63) and the identity -R^ 1 = &n n k n f 1 ^' wc S e t the bound 

\dI n \<cr n \0W/£)(4 3 0- 
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(62) suggests that B^^W^TO < CRj? (£)*)/*?/* = C/ 1/4 (£)^S < Cd l J 2 N . The 
last inequality holds as / < 4 for s e I2.N ■ On the other hand, cr ni jv < (7(7) A^ 1 / 3 < Cs 4 for 
large N. Assembling all the pieces, we obtain R~^ 1 (£)A/ \n ^ C) < Cs 2 . Together with (69), 
this implies 

\Dl N \ < Cs ' 2 cxp(-3s/2) < Cs- 4 exp(-s) < CW~ 2/3 exp(-s). 



Therefore, T N>1 < CN~ 2 / 3 cxp(-s) on I- 



2,7V- 



(73) 



The T/v,2 term This term is relatively easy to bound. Note that 5 n ,N/iJ>n,N = 0(N 2 / 3 ) 
and that T Nj2 {s) = \O ntN (x n ^ N (s))a n ^ N /x n _ N (s)\. So, for all s> s , N > N (so), 

\d- n ,N/x n ,N{s)\ = Is + An.iv/cr^Jvl -1 < C(s )^" 2/3 . 

Together with (66), this implies that for all s > sq, Tjv,2(s) < C(s )iV _2 / 3 exp(— s). 
Summing up By (68), the bounds on Tjv,i and Tjv.2 transfer to 

|d s 0„,jv(a: n ,iv(s))| < C(s )exp(-s) 
uniformly for s> sq. On the other hand, we note that 

1p' T (s) = -^PNd S On-l,N-l{x„-l,N-l(s)), 

with pm = 1 + O^" 1 ). Thus, (73) implies the desired bound on \ijj' T \ in (35). 

5.3.2. Bound for \ip' T (s) - G'(s)\ 

By the triangle inequality, we bound \ip' T (s) — G'(s)\ as 

|<(a) - G'(s)| < ^Ipat - l||a a tf T1 _ 1 ,iv-i(a;„_i f iv-i(s))| 

+ ■^\d a 6 n -i,N-i{x n -i,N-i(s)) - Ai'(s)|. 



(74) 



As pn = 1 + 0{N~ 1 ), by (73), we bound the first term by C(sq)N~ 1 exp(— s). In what 
follows, to bound the second term in (74), we focus on \d s 9 n ,N (xn.N ( s )) ~ Ai (s)|, which 
can first be split into two parts as: 



\d s 6 n , N (x niN (s)) - Ai'(s)| 



< <Tn,NFn,N( x n,N(s)) 

= Tn,i( s ) +Tn,2(s)- 



H>n,N 
Xn,N(s) 



Ai'W 



<^7i,NFn,N(x n ,N(s)) 



fJ-n.N 



<jv( s ) 



TTie 7~n,i(s) term For this term, we separate the arguments on 7x,jv = [so 5 SiAf 1 ' 6 ) and 
/ 2 .Ar = [siiV 1/6 ,oo). 
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Case s <E Ii,n On l^jv, we decompose T/v,i( s ) as Tn,i( s ) = Si=i ^,\ ad with £*n 
D z nN iln,N/xn,N(s) for i = 1,2 and 4, and 

t-i3 A*n,iV ri.// 2/3 *\ a-// m 



JV 



D' 1 



A r 



rjy 



fJ-n,N 
X n ,N(s) 



Ai'( S ). 



Observe that |/5 n ,A r /a ; n,A''(s)| < 2 on I^at. Thus, by previous bounds on D^ jy, we obtain 
that, for i = l,2 and 4, \V^ >N \ < C(s )A- 2 / 3 exp(-s). 

Consider 2? 3 N . By the Taylor expansion, for some s* between njj (, and s, 



|Ai'(4^C) - Ai'(s)| < |Ai"( s *)li< C ->\ = | S *Ai( s *)||4^C - *|, 
where the equality comes from the identity Ai (s) = sAi(s). By Lemma 3, we have that 
\ k n C — s l ^ CN~ 2 / 3 's 2 , and that s* lies between |s and |s. The latter, together with 
Lemma 1, implies that, for s > 0, 

|s*Ai(s*)|<C*cxp(-3s/2). 

If so < 0, we then have s* £ [|s,0], and hence we can replace C on the right-hand side 
with C(so) = max{C, max se [ 3so /2,o] |sAi(s)|}. Observe that r^ = 1 + 0(N^ 1 ) and that 
\fin,N/x n ,N(s)\ < 2. We thus conclude that 

PiUl < C( So )A^" 2/3 s 2 exp(-3 S /2) < C(s )A- 2 / 3 cxp(-s). 
Switch to T>^ N . We first note that 



I'M 



H>n,N 
Xn,N{s) 



< r N 



fJ>n,N 



Xn,N(s) 



rjv s 






+ \r N -l\ 



+ \r N - 1| < CW" 2 / 3 |s| +CA- 1 . 



The last inequality holds as 5 rh N /P>n,N = 0(A~ 2 / 3 ), rAr = 1 + 0(A _1 ), and for large A, 
\s + fi n ,N /&n,N\ > \jJ-n,N / &n,N uniformly for s G /i,jv- On the other hand, Lemma 1 
implies that |Ai'(s)| < C(so)cxp(— 3s/2). Putting the two parts together, we obtain 

ICmtI ^ C( So )A- 2 / 3 (| S | + CN~ 1 ' 3 ) cxp(-3s/2) < C(s )A- 2 / 3 cxp(-s). 

Assembling all the bounds on the T> % n jy-'s, we obtain that, on I±.n, 

TnAs) <C(s )N- 2 / s exp(-s). 

Case s € /2../V In this case, we could act more heavy-handedly. In particular, by the 
asymptotics of Tjv ] i(s) on /2,at and Lemma 1, we have 



T/v,i(s) < 



&n,NFn N (x niN (s)) 



fJ"Tl,N 



+ |Ai'(s)| < CA- 2 / 3 exp(-s) + Ccxp(-3s/2) 



Xn,N(s) 

< CN~ 2/3 exp(-s) + CA~ 2/3 s 4 exp(-3s/2) < CA~ 2/3 exp(-s). 
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The Tn,2{s) term The Tn,2 (s) term is the same as 7at i2 (s) defined previously in the 
study of d s 6 n ^N(xn,N(s)) and hence we quote the bound derived there directly as 

TnA s ) < C{s )N- 2/3 cxp(~s) for all s > s . 
Summing up Combining the bounds on T/v,i and T/v,2, we have, uniformly for s > sq, 

\d s 6 n , N (xn, N (s)) - Ai'(s)| < C( So )7V- 2 / 3 cx P (-.s). 
By the discussion following (74), we obtain the desired bound on \ip' T (s) — G'(s)\ in (37). 

5.3.3. Improved bound for \tp T — G\ 

The bound on \ip' T (s) ~ G"(s)| , together with (67), can lead to a tighter bound for \ij) T {s) — 
G(s)\ as the following: 

[i>' T (t)-G'(t)]dt-[M2s)-G(2s)] 
<l K(i)-G'(t)|dt+|Vr(2s)-G(2s)| (75) 

p2s 

< / C*(s )iV^ 2/3 e- t di + C(so)A r ~ 2/3 exp(-s)<C*(so)iV" 2/3 exp(-s). 

J s 

This is exactly what we claimed in Proposition 2. 

5.4. Asymptotics for quantities related to <p T (s) 

In this part, we employ a trick in [15] to transfer the bounds on the quantities related 
to ip T to those related to <p T . 

Recall that, for p^ — 1 + 0(A^ _1 ) (see Section A. 5 for its proof), 

"M S ) = — ^PNFn-2,N{Xn-l,N-l(s))- 



2 s 



V2 a;„_i j j V _i(sj 

If the a;„_i.Ar_i(s) term on the right-hand side were x u -2,n(s), then all the bounds we 
have proved for ip T would also be valid for <p T . As this is not the case, we introduce a new 
independent variable s' as: 

x n -i >N -i(s)=x n -2,N(s'), (76) 

that is, s' = {fi n -i,N-i - Mn-2,Jv)/5'n-2,iv + sa n -i,N-i/^n-2,N- (The readers are ex- 
pected not to confuse it with the s' that previously appeared in Section 3.1.) Then <fi T 
can be rewritten as 

^( S ) = —^PNF n -2.N(x n ~2.N(s')) ™ V7T = — ^=PJV^n-2,Jv(^n-2,Ar(s'))- 

V2 x„_ 2 ,at(s') V2 
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Recalling the definition of An in (33), we have s' — s = Ajy + [a n -i,N~iO' n _2 N ]s, with 



A N = OiN- 1 / 3 ), 1 < a„-i,N-iK-2.N = 1 + OiN- 1 ). 



(77) 



Bounds for \<fi T (s)\ and \cj>' T (s)\ 



Recall previous bounds on |#n,7v(^n,Ar( s ))l and \d s 8 ni N(x n ,N(s))\- Together with (77), 
they imply that, for all s > so, 



and 



|<Ms)| < C(s ) exp(-s') < C(s ) exp(-s) 



\<f>' T (s)\ = —7=pN\d S 0n-2,N{Xn-2,N{s'))\ 

1 ds' 

= —E:PN\ds>0n-2.N{Xn-2,N{.s'))\ — 

V2 ds 

< C(so)exp(— s')— ? — ; < C(s )exp(— s). 

0~n-2,N 

Bounds for \<j> T (s) — Gn(s)\ and \4>' T {s) — G' N (s)\ 

We consider \<f) T (s) — Gn(s)\ in detail and the derivation for the bound on \(j>' T (s) — G' N (s)\ 
is essentially the same. 

By the definition of s' and the identity Ai (s) = sAi(s), we obtain the Taylor expansion 

G(s') = G(s) + (s' s)G'(s) + \{s' - sfG"{s*) 



G N (s) + 



v/2 



0~ n -l,N-l 
&n-2,N 



s M'(s) + ^=(s'-s) 2 s*Ai(s*), 



with s* lying in between s and s' . By the previous discussion on \tp T (s) — G(s)\, this leads 
to 



\(f> T (s) - G N (s)\ < C(s )N-^ 3 exp(-s') + CN-^sM'is^ + C(s' ~ s) 2 |s*Ai(s*)| 
<C(s )A^ 2/3 exp(-s) + C(s'-s) 2 |s*Ai(s*)|. 
To further bound the last term, we split [so,oo) into Ii_n U I2.n- For s 6 I\,n, 

S'n-l.N-l 



(78) 



(s-s'Y 



A 



N 



-Is 



< [CN~ 1/3 + CN- l s] 2 < (CN- 2/3 ) A 1. 



0~ n -2,N 

So \s*\ < \s\ + 1, and Lemma 1 implies that 

C(s-s') 2 |s*Ai(s*)|<C(s )JV- 2 / 3 exp(- s ). 
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On I2,n, (77) implies that s' > s/2, and hence s* > s/2. Together with Lemma 1, this 
implies 

C(s' - s) 2 |s*Ai(.s*)| < Cs~ 4 • |(s*) 7 Ai(.s*)| < CA^ 2/3 cxp(-s). 

Therefore, we have shown that, for all s > sq, the last term in (78) is further controlled 
by C(so)Af~ 2 / 3 cxp(— s), which in turn gives the desired bound for \<f> T — Gn\- It is not 
hard to check that all the C(sq) functions in the above analysis could be continuous and 
non- increasing. 

Appendix 

In the Appendix, we collect technical details that led to some of the claims previously 
made in the main text. Section A. 5 gives proofs to properties of a number of constants. 
Section A. 6 works out the details on the choice of si, which was used to decompose the 
interval [so,oo) in Section 5. 

A. 5. Properties of f3 N , p N , p N , A N and n -i,N-i/&n-i,N 

Property of /3jv 

We are to show that /3jv = -75 + 0(A^ _1 ). By definition, we know 



£ 



1 f°° If 00 

- <t>r{s)ds=- (j)(x;a)dx 

2 v / 2r 1 /2( n ) J NK ' 

_ 2- a / 2 N 1 / A {n - l) 1 / 4 T 1 / 2 (n)T({l/2)(N + 3)) 
~ (N + l)rV2(iV + l)r((l/2)(n + 1)) ' 

Applying Sterling's formula T(z) = (2n/z) 1 ^ 2 (z/e) z (l + 0(z~ 1 )), we obtain that 

(2n/n) 1 ^(n/e) n / 2 [4n/(N + S)]^ 2 [(N + 3)/(2e)p+ 3 )/ 2 



0. 



[27T/(^+l)] 1 /4[(7V + l)/ c ](A'+l)/2[4 7t /( n+1 )]l/2[( n+1 )/(2 e )](«+l)/2 

2-«/2AT 1 /4f n _ iU/4 
X N+l ( 1 + °( jy ")) 



1 / 1 \"/2/ 2 \ (JV+l)/2+3/4 



v^V n+l J V N + 1 



l-^TT U + TTTT (l + 0(«-')) 
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Properties pn and pn 

We want to show that pn,P~n = 1 + 0(N^ 1 ). Consider p^ first. By definition, we have 

N^(n 1)V4 ? V2 n<N N l/* {n _ 1)1/4^ 

Pn = = ; ■ • 

P-n,N Mn-ljJV-l 

Plugging in the definition of a n -i,N-i an d p. n ~i,N-ii we obtain that 

-1/2 , 1 2X1/2 

~JN - 1/2 + ~j^TJ2, 



P7V = AT 1/4 (n - 1) 1/4 Ca/n - I + \f^l) ( ; : + ; 1 V 



V4 / m 1 \ 1/4 



TV V'V n-1 



= l + 0(A^ i ) 



N-1/2J yn-1/2^ 
For pjv, we have 

. A^V-l)^ 4 ^^^ ^- 1 , jy - 1 iV 1/4 (n-l) 1 /M / _ 2 2 , Ar 

/OW = ; ■ = —: 

P-n-2,N CT n -2,N Pn-2,N 

T I q"\ -I/ 2 / 1 \ I/ 2 

<Jn-l.N-l 1/4( , 1/4 / /,. / 3 



ATV>_1)^ 7V+- + Jn- 



v n -2,N \V 2 V 2/ VV^+1/2 ^n - 3/2 

*n-2,N \N + l/2) U-3/2J + ^ J 



The last equality holds since fx n _ijv-i/0Vi-2,JV = 1 + 0(iV x ) as claimed in (33), which 
is to be shown below. 

Property of Ajy 

Recall the definition Ajy = {p n -i,N—i — /Un-2,Jv)/o'n-2,jv- By [10], A. 1.2, the numerator 
p n -i,N-i — A«-2,jv = 0(1). For the denominator, let "f n ^ = (n — |)/(iV + |). We then 
have 



1/1 i \-i/3 



1 I L r 1 / 3\~V 1 1 



ct„_2,jv \V 2 V 2/ Vv^+1/2 Vn - 3/2 

1 "+" 7 n ,JV v 7 

The last equality holds since j n ,N is bounded below for all n> N . Combining the two 
parts, we establish that Ajv = 0(-/V -1 / 3 ). 

Property of a n -i iN -i/a n ^ 2 ,N 
We now switch to prove that 

1 < a n -i,N-i/Zn-2,N = 1 + O^- 1 ). 
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[10], A. 1.3, showed that <Tn-i,N~i/<7n-2,N = 1 + 0(N^ 1 ). On the other hand, we have 
from the second-to-last display of [10], A. 1.3, that 



V On-2,N J [ n + N 



i+ 5u4) 



Both terms become greater than 1 when TV > Nq(^), and hence (T n -i.N-i/'Jn-2,N > 1 
for large TV. Actually, the inequality holds for any n > TV > 2. However, what we have 
proved here is sufficient for our argument in Section 5.4. 

A. 6. Choice of si and its consequences 

The key point in our choice of si is to ensure that when s > s%, we have 

| KJ vC 3/2 > |s- (79) 

To this end, recall that in [15], A. 8, one could choose Si(-f) = C(j) (1 + 5) with some 
6 > 0, such that when s > Si(-f), we have \//(£) > 2/a n ^ and hence if s > 4si(7), 

^atC 3/2 = k n f ^f^)dz>K N J-(s-s 1 (l))^^ L = 2(s-h(l))>^s. 



3 j£ + <Jn,N UN 2 

Moreover, by the analysis in [10], A. 6. 4, ,§1(7) could be chosen independently of 7 and 
hence we could define our s\ to be 

si =4.5i, 

which is independent of 7 and such that (79) holds. Moreover, we also require that s\ > 1. 
After specifying our choice of s\, we spell out two of its consequences. The first of 
them is that when s > si > 1, 

£ _1 («w 3 C) < Ccxp(-3s/2) < Cexp(-s). (80) 

This is from the observation that £ (x) > Cexp(2a; 3 / 2 /3) and hence 

£ -1 (/# 8 C) < Ccxp^-^jvC 372 ) < Ccxp(-3 S /2). 

The other consequence is about the behavior of s' defined in (76) when s > s±. Re- 
membering that si > 1, we then have that when s > s± and TV > No(j), 

s'^l=A N+ (^^- l -) s >A N + ^>A N + 1 ->0. (81) 

The last inequality holds when TV > TV (7), for A N = 0(A" 1 / 3 ). 
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